A simple and robust method for isolating and analyzing chromatin-bound RNAs in Arabidopsis

Background Chromatin-bound RNAs are the primary product of transcription that undergo on-chromatin processing such as capping, splicing, and polyadenylation. These processing steps then determine the fate of RNAs. Albeit its vital importance, a simple and robust method for isolating different fractions of chromatin-bound RNAs is missing in plants. Result Here, we describe our updated method and the associated step-by-step protocol for chromatin-bound RNAs isolation in A. thaliana. The chromatin-bound RNAs isolation is based on the 1 M UREA wash that removes the majority of non-chromatin-associated proteins from the nucleus, as previously developed in mammalian cells. On-demand, the isolated chromatin-bound RNAs can be either used directly for gene-specific analysis or subject to further rRNA removal and also the optional polyadenylated RNA removal, followed by high-throughput sequencing. Detailed protocols for these procedures are also provided. Comparison of sequencing results of chromatin-bound RNAs with and without polyadenylated RNA removal revealed that a small fraction of CB-RNAs is polyadenylated but not yet fully spliced, representing RNA-processing intermediate on-chromatin. Conclusion This optimized chromatin-bound RNAs purification method is simple and robust and can be used to study transcription and its-coupled RNA processing in plants. Supplementary Information The online version contains supplementary material available at 10.1186/s13007-022-00967-y.


Background
As sessile organisms, plants perceive the fluctuant environmental stimuli and adjust their internal gene expression constantly to be able to adapt to the local conditions. To a certain extent, plants' phenotypic plasticity is determined at the transcriptional level. However, transcription is a rather dynamic process; the steady-state RNA levels are an outcome of transcription initiation, productive elongation, and termination, followed by RNA processing and decay. Although conventional mRNA-seq provided a robust way to convey transcription control, it is only by directly tracking newly synthesized RNA that the instant changes in plant responses to developmental, environmental stimuli, and disease can be revealed.
Over the past decades, a series of high-throughput sequencing methods have been developed for investigating nascent RNA molecules from the pool of cellular RNA [1]. According to the different strategies used for the enrichment of nascent RNA, those methods can be classified as CB-RNA-seq (Chromatin-Bound RNA-seq), † Qiqi Zhang and Fengli Zhao are contributed equally to this work *Correspondence: wuz@sustech.edu.cn; zhudl@sustech.edu.cn which involves the extraction of chromatin-associated RNAs by rigorous urea wash steps [2][3][4][5][6]; NET-seq (Native Elongating Transcript sequencing), which isolates the Pol II-associated RNAs and requires immunoprecipitation of Pol II [7,8]; GRO-seq (Globe nuclear Run-On sequencing), which can capture the RNAs from elongation competent Pol II complex by performing a run-on with isolated nuclei [9,10]; TT-seq (Transient Transcriptome sequencing) that involves a brief exposure of cells to the nucleoside analog 4-thiouridine (4sU), and enables the mapping of transcription active region and the estimation of RNA synthesis and degradation [11]. The techniques mentioned above were used to answer questions about transcriptional regulation on the nascent RNA level, and each has its strength and limitation. Notably, despite the widespread nascent RNA-related research in other model systems, research based on nascent RNA in plants was just emerging owing to technical difficulties.
Since 2015, the GRO-seq method has been modified and adapted to profile the nascent transcripts from Pol II and Pol V in different plant species [12][13][14][15]. Pol II dynamics in Arabidopsis seedlings were also comprehensively studied by pNET-seq, a method that employs immunoprecipitation with different antibodies targeting Pol II isoforms with different CTD phosphorylation [15]. It was shown that Pol II with different phosphorylated CTD gave distinct association patterns [15]. By plaNETseq, a NET-seq method that employs immunoprecipitation of total Pol II from a transgenic tagged line, it was reported that the promoter-proximal Pol II stalling function to predisposing genes for transcriptional activation upon cold exposure [16]. Notably, although NET-seq and GRO-seq efficiently capture RNAs associated with elongating Pol II complex, these methods involve Pol II immunoprecipitation with high affinity Pol II antibody (NET-seq), or Br-UTP incorporation into the nascent RNA during in vitro nuclear run-on (GRO-seq). These steps dramatically increase the difficulty of operations.
We have initially adapted the CB-RNAs isolation method for the study of FLC in Arabidopsis [17]. Combined with mathematical modeling, the unique CB-RNA distribution pattern along FLC intron1 revealed autonomous pathway components such as FCA and FLD functions to coordinately repress transcription initiation and elongation [17]. Recently, by CB-RNAs isolation in combined with Illumina high-throughput sequencing, we and others showed that co-transcriptional splicing was widespread in A. thaliana [18,19]. Since then, CB-RNAs isolation was considered the most straightforward approach to obtain nascent RNAs in plants and a number of studies have successfully applied this method for CB-RNAs isolation [20][21][22]. Compared with other approaches, isolating CB-RNAs only involves two washes using a urea-containing buffer after nuclei extraction; hence it is much more straightforward and can be combined with a gene-targeted approach, next-generation sequencing, or long-read sequencing to investigate transcription and its coupled pre-mRNA splicing. With minor modifications, CB-RNA isolation method should be applicable in a wide range of plant species as already been demonstrated in soybean [20]. Albeit the great simplicity and durability of CB-RNAs isolation, a detailed description of the principles and cautions of this method for plant is missing.
Here, we describe our up-to-date method for chromatin-bound RNAs isolation from Arabidopsis thaliana seedlings. Combined with quantitative PCR or nextgeneration sequencing, the isolated CB-RNAs is suitable for analyzing the co-transcriptional intron splicing status both at the single gene and genome-wide level. For a given locus, the CB-RNA reflects the sum of elongating RNA (RNAe: RNAs being transcribed at the locus) and full-length polyadenylated RNA (RNAf: RNAs have been fully transcribed and polyadenylated but not yet released from chromatin). Hence, we also provided a step-bystep method to separate those two CB-RNA fractions and analyzed the corresponding distinct features. Our data showed that polyadenylated RNA likely represents the intermediate of on-chromatin RNA processing and accounts for only a small fraction of CB-RNAs. We thus propose a guideline on when the rRNA and or polyadenylated RNA needs to be removed before downstream analysis. Our work provided a simple and robust CB-RNAs isolation method in Arabidopsis, which in principle can be easily adapted to be used in a wide range of plant species.

Isolation of chromatin-bound RNAs from Arabidopsis seedlings
After transcription initiation, RNA polymerase II (Pol II) forms a stable complex with DNA template which resists nonionic detergent and urea. Taking advantage of this feature, Pol II associated nascent RNAs were successfully isolated from the free cytoplasmic and nucleoplasmic RNAs in different model systems, including mammals, Drosophila, and yeast. Based on these pre-existing knowledge, we describe a method suitable for CB-RNAs extraction in Arabidopsis and likely other plant species such as soybean. The extraction procedures mainly involve a nuclei extraction step followed by two stringent wash steps with a buffer containing urea (Fig. 1A). The resulting pellet was enriched with elongating nascent RNAs bound by Pol II, as well as some polyadenylated that is not yet released from chromatin. As the quality control, the protein samples from each extraction step were isolated and assessed by western blot using antibodies specific to Pol II and histone 3 (H3), which are markers of chromatin. A ubiquitin-conjugating enzyme localized in the cytoplasm, UBC1, served as another control. We found that most Pol II and H3 were detected in the chromatin fraction, while UBC1 was found mainly in the total and cytoplasm fraction (Fig. 1B), suggesting chromatin-associated protein was successfully enriched in the chromatin fraction. To check the integrity of the isolated RNAs, each fraction was checked by electrophoresis. 2% lowmelted agarose gel and SYBR gold fluorescent dye were used to achieve great sensitivity and save the amount of CB-RNAs used. Notably, the 28S and 18S rRNA bands were clearly observed in chromatin bound RNAs fraction, indicating the integrity of CB-RNAs (Fig. 1C).

Assessment of nascent RNA enrichment in the isolated CB-RNAs by qPCR
CB-RNAs isolation captures all the chromatin-associated RNA species, including the nascent transcripts containing introns that are going to be, but not yet spliced out. This allows for studying co-transcriptional processes, such as co-transcriptional intron splicing (CTS) at a single gene or in the whole genome. Here, to evaluate the enrichment of nascent RNA in different extraction fraction, the ratio of unspliced RNA to spliced RNA of a specific intron at 5 randomly selected genes were estimated (Fig. 2). Primer pairs that amplify regions correspond to intron-exon junction and exon-exon junction were designed to pick up the unspliced and spliced RNA, respectively. With these primers, the quantitative Note that the UBC1 is absent from the nucleoplasm and chromatin fraction, while H3 and Pol II are highly enriched in the chromatin fraction. C Gel-electrophoresis detection of RNAs in different fractions. Note that 28S and 18S rRNA bands can be observed clearly in different fractions, indicating RNA degradation is minimal during extraction real-time PCR were then performed on different fractions that were kept during the CB-RNAs isolation procedure ( Fig. 1A), that is, the nuclei fraction that was kept after nuclei extraction, the wash I fraction that resembles the nucleoplasm, the wash II fraction that contains contaminated nucleoplasm and chromatin fraction that are released by the second wash, and the CB-RNA fraction. As shown in Fig. 2, the ratio of unspliced RNA to the spliced RNA at the given exon-intron-exon unit in the wash I and wash II fractions stays at a relatively similar value to that in total RNA, suggesting the stable association between nascent transcripts and chromatin during urea washing. Importantly, the ratio of unspliced RNA to the spliced RNA is higher in the nuclei fraction compared with that in the total RNA fraction and is highest in the CB-RNA fraction, suggesting the successful enrichment of nascent RNAs.

Workflow of sequencing library construction for CB-RNA-seq
To investigate the intron splicing status on a genomewide scale, we performed CB-RNA-seq. As shown in Fig. 1C, ribosomal RNAs account for the majority of isolated CB-RNAs, indicating that the removal of rRNA prior to sequencing library construction is needed. Apart from the ribosomal RNAs, we reasoned that another major class within CB-RNAs is the RNAs bound by Pol II that have undergone elongation (RNAe, elongating RNAs). In addition, the isolated CB-RNAs could contain polyadenylated RNAs (RNAf, full-length RNAs) that may represent contaminated mature mRNA and or nascent RNAs that were already polyadenylated but not yet released from chromatin. To determine the relationship between RNAe and RNAf and better understand their biological relevance, we constructed CB-RNA-seq In fashion I (RNAe + RNAf ), the isolated CB-RNAs were subject to rRNA removal using RiboPool oligo probes that target rRNA and then used for RNA-seq library construction (Fig. 3). In the second fashion (RNAe), the CB-RNAs were subjected to polyadenylated RNA removal using oligo d(T) probes (Fig. 3). The resulting CB-RNAs were then subject to rRNA removal and followed by sequencing library construction. In the third fashion (RNAf ), the chromatin-bound polyadenylated RNAs, as isolated in the fashion II, were subject to sequencing library construction (Fig. 3). For all three fashions, strand-specific sequencing library construction was performed with NEBNext Ultra II Directional RNA Library Prep Kit based on the dUTP method [23,24] (Fig. 3). A sequencing library of the mature polyadenylated mRNA was also made as a control. The resulting strand-specific library was then subject to paired-end sequencing using an Illumina platform (Table 1).

Nascent RNA profiling of Arabidopsis seedling by CB-RNA-seq
We first analyzed the sequencing reads distribution along genes of CB-RNA-seq data with only the rRNA depleted but with polyadenylated RNA kept (RNAe + RNAf ). Compared with mRNA-seq data, two distinct features were observed in CB-RNA-seq data. Firstly, the read density of CB-RNA-seq declined gradually from 5ʹ to 3ʹ end, while no such pattern was observed in mRNA-seq ( Fig. 4A and B). Assuming that elongating RNA dominates the CB-RNA fraction, there would be more nascent RNA at gene 5ʹ end than 3ʹ end, given RNA synthesis always occurs from 5ʹ end to 3ʹ end. Instead, a slightly opposite trend was observed in the mRNA-seq data, likely due to enrichment of RNA 3ʹ end when purifying the mRNA using oligo d(T). Secondly, as we expected, CB-RNA-seq detects much more intron reads than mRNA-seq due to the capture of nascent RNAs with intron not yet removed (Fig. 4A). Consistently, the ratio of intronic reads to exonic reads in CB-RNA-seq is much higher than that in mRNA-seq (Fig. 4C). To precisely calculate the co-transcriptional splicing (CTS) efficiency, we adopted a 3ʹSS/5ʹSS ratio to measure the extent of co-transcriptional splicing [4,25,26]. Briefly, at the given intron-exon boundary, we calculated the ratio of intronic reads over the adjacent exonic reads to yield a 5ʹ splicing site ratio (5ʹSS ratio) and a 3ʹ splicing site ratio (3ʹSS ratio) (Fig. 4D). Thus, the 5ʹSS/3ʹSS ratio reflect intron retention levels at the chromatin and is negatively correlated with splicing efficiency. As we expected, the median value for CB-RNA was around 0.25, which is in line with previous results [18,19] and it was significantly higher than that in mRNA-seq (Fig. 4E), suggesting that splicing predominantly occurs at the co-transcriptional level.
A small proportion of CB-RNAs are polyadenylated and generally represent RNA processing intermediates on chromatin.
Next, we wondered about the relationship between RNAe and RNAf in the isolated CB-RNAs. Thus, we analyzed and compared the sequencing results obtained from RNAe + RNAf, RNAe, and RNAf (Figs. 3 and 5A). We found that the 5ʹ SS and 3ʹ SS ratio in RNAe is only slightly higher than that of RNAe + RNAf (Fig. 5B), suggesting that RNAf only accounts for a minor proportion of CB-RNAs. Indeed, the 5ʹ to 3ʹ declining pattern in RNAe + RNAf also supports the above conclusion (Fig. 5B). The SS ratio of RNAf is much lower than that of RNAe or RNAe + RNAf (Fig. 5B), suggesting that the introns in RNAf are largely removed, consistent with the fact that these RNAs are already polyadenylated. Notably, the SS ratio of RNAf is still significantly higher than that of mature mRNA (Fig. 5B), suggesting RNAf is most likely an intermediate of RNA processing on chromatin instead of the contaminated mature mRNAs during CB-RNAs isolation.
Since multiple factors could also affect CTS efficiency, such as the intron positions at a given gene and intron numbers, we looked at how CTS efficiency correlates with the order of intron within genes. We extracted genes with the same intron numbers and analyzed the corresponding 3ʹSS and 5ʹSS at each individual intron from RNAe, RNAf, and RNAe + RNAf (Fig. 5C). The results showed that both the 3ʹSS and 5ʹSS ratios were gradually increased from the first intron to the last intron (Fig. 5C, and Additional file 2: Fig. S1), consistent with a "first come, first serve" model of CTS [27]. Notably, such a trend was observed clearly in the RNAe (Fig. 5C), indicating that the removal of RNAf when performing CB-RNA-seq could be beneficial when accurate analysis of CTS is required. In addition, the absence of such a trend in RNAf and, to a certain extent, also in RNAe + f B Boxplot showed splicing ratio of 5ʹ SS and 3ʹ SS ratio of RNAe, RNAf, and RNAe + RNAf. ***p < 0.001, Wilcoxon test. The boxplot showed the median and the 25th and 75th percentiles. C The relationship between co-transcriptional splicing (CTS) and intron order in RNAe, RNAf, and RNAe + RNAf. The genes containing 9 introns were selected as representatives. X-axis indicated the intron order from 1st to 9th. The analyzed gene number was indicated at the top of each box. D The relationship between co-transcriptional splicing (CTS) and intron number in RNAe, RNAf, and RNAe + RNAf. The genes containing 1 to 10 introns were selected as representatives. X-axis indicated the intron number from 1 to 10. The analyzed gene number was indicated at the top of each boxplot suggests that nascent RNAs could be held at chromatin for a substantial time post active elongation, enabling effective on-chromatin splicing (Fig. 5C).
Next, as intron numbers affect splicing efficiency [18], we plotted the 3ʹSS and 5ʹSS ratios with the genes containing different intron numbers. The results showed that the numbers of introns/exons negatively correlated with the average 5ʹSS and 3ʹSS ratio in all three fractions of CB-RNAs (Fig. 5D), suggesting strong cooperativity of splicing among different exons/introns within a gene. Notably, the fact that such cooperativity can also be seen in RNAf but not in mature mRNA, further strengthens the conclusion that RNAf represents RNA processing intermediates that are polyadenylated but still undergo processing steps such as RNA splicing on chromatin (Fig. 5D).

Discussion
Nascent RNA sequencing has been widely used to study the co-transcriptional regulation in many species. However, isolating the high-quality nascent RNAs from plant tissue is still challenging due to the presence of cell walls and other reasons. Several methods, such as CB-RNA-seq, pNET-seq, and GRO-seq, have been developed recently to detect nascent RNA and reveal plant-specific transcriptional features. In this study, we provided a detailed protocol for isolation of the chromatin-bound RNA from A. thaliana seedlings. Given the unique advantages of this approach, the CB-RNA extraction method is considered the most straightforward way for nascent RNA isolation. Firstly, the procedure is relatively simple, time-saving, and can be done with limited materials (One gram of seedlings are sufficient for most of applications). The entire CB-RNA isolation procedure can be done within 2-3 h from raw materials to purified RNAs. Secondly, the method is more economical compared with other parallel methods such as pNET-seq and GRO-seq given that CB-RNAs isolation does not involve immunoprecipitation steps that are time-consuming and expensive. Thirdly, once the CB-RNAs is obtained, it can be handled the same way as the total RNA, which is compatible with all sorts of downstream applications or sequencing library construction strategies. For example, the isolated CB-RNAs can be combined with specific downstream methods to assay 3' end processing, RNA methylation, RNA editing, anti-sense RNA, circular RNA, and so on at the chromatin level. CB-RNA isolation method should be applicable in a wide range of plant species as illustrated successfully in Soybean [20]. However, extract CB-RNAs from seeds or the other tissues that are rich in secondary metabolites (such as lipid, starch…), extra steps might be needed to remove those distractors before better CB-RNA isolation.
There are also limitations for CB-RNAs isolation and sequencing. Firstly, our current protocol only recovers CB-RNAs above 200nt. This is due to the column purification of CB-RNA (with 200nt cut-off for the Qiagen RNA miniprep kit) and also the size selection in the Illumina RNA-seq library generation step. Thus, caution and adjustment of the protocol will be needed when the nascent RNAs below 200nt are of interest. Secondly, in association with the first point, the CB-RNAs is suitable for studying nascent RNA and its associated RNAprocessing steps but less suitable for looking at the Pol II occupancy. For example, Pol II pauses at both 5ʹ and 3ʹ end of the gene in Arabidopsis [15], while the paused Pol IIs are likely associated with nascent RNA smaller than 200nt, which would not be resolved effectively in CB-RNA-seq. Indeed, both the 5ʹ and 3ʹ end pause of Pol II were not observed in our CB-RNA-seq data. For accurate Pol II profiling, NET-seq is recommended. Thirdly, the inactivation of endogenous RNAase during the CB-RNA isolation procedure is critical and sometimes can cause problem. Plant cells contain rich sources of endogenous RNase. In the current protocol, high concentrations of RNase inhibitors and yeast tRNAs are employed to buffer against endogenous RNase activity. In addition, the handling time from cell lysis to the urea washing steps needs to be minimized. Thus, we recommend only handling one or two samples simultaneously for these steps. After the UREA washing steps (Fig. 1A), all the samples could be handled together for the following RNA extraction.
Our data showed that a small fraction of full-length polyadenylated RNA (RNAf) is included in the isolated CB-RNAs (Fig. 5D). The incomplete splicing status, plus the observed cooperativity among different intron splicing in this RNA fraction, suggest these RNAs are most likely the RNA processing intermediates that are polyadenylated but not yet released from chromatin (Fig. 5D). Thus, they represent a fraction of RNAs that sit on chromatin but not the contamination during CB-RNAs purification. Thus, in most cases, it is unnecessary to remove these polyadenylated RNAs prior to sequencing library construction or qPCR-based analysis. However, the removal of polyadenylated RNA is needed when only the elongating RNA is of interest. The presence of polyadenylated but not fully spliced RNA on chromatin suggests there is no strict order between polyadenylation and splicing. In addition, gene splicing status might be monitored by a mechanism linked with Pol II at gene 3' end such that transcripts with unspliced intron are not effectively released from chromatin even when it is polyadenylated. An exciting hypothesis that awaits to be explored in the future.

Conclusions
We described a simple and robust method to enrich chromatin-bound RNAs that include nascent RNA undergone elongation by Pol II and polyadenylated RNA that has undergone RNA processing on chromatin. The isolated chromatin-bound RNAs are compatible with all sorts of downstream applications, such as the studying of co-transcriptional RNA-processing combined with nextgeneration sequencing.

Plant material and growth conditions
Arabidopsis thaliana (Col-0) seeds were sterilized in 10% sodium hypochlorite for 10 min and rinsed by sterile distilled water 3 to 4 times before being sown on 1/2 MS media. Seeds were stratified at 4 °C for 3 days and transferred to a growth chamber (Percival CU-36L5), 16 h light/8 h dark, 22 °C. 10-day-old seedlings were collected and snap-frozen in liquid N2 for the following chromatin-bound RNAs (CB-RNAs) extraction.

Chemicals and reagents
Note that RNase-free chemicals and solution are critical for CB-RNA isolation.

Buffer compositions
For CB-RNA isolation:

Wash buffer
25 mM Tris-HCl pH 7.5, 300 mM NaCl, 1 M Urea, 0.5 mM EDTA, 1% Tween 20 Solution is filtered through 0.45 μm nylon membrane and can be kept at − 20 °C for up to 3 months.
For rRNA removal:

Depletion buffer
Protocol for Chromatin-bound RNA extraction Before start • Use RNasezap to clean the bench and pipette before starting. • Pre-cool the centrifuge to 4 °C. • Prepare working solutions (e.g., Honda buffer 10 mL/sample; Nuclei resuspension buffer 800 μL/ sample; Wash buffer 1.2 mL/sample) freshly before use. • Given that endogenous RNAase can't be inactivated thoroughly during the procedure, handling one sample each time is recommended.

Chromatin extraction (~ 2 h)
1. Take 1 g of Arabidopsis seedlings and grind them in liquid nitrogen to a fine powder. 2. Resuspend the ground powder with 8 mL precooled Honda buffer and homogenize quickly by stirring with a 1 mL tip. Filter the solution through two layers of Miracloth into a new ice-cold 50 mL falcon tube. (Note: 2 mM DTT, 1 × tRNA solution, 80 μL RNase inhibitor, and 1 × proteinase inhibitor cocktail are added into 8 mL Honda buffer just before use). 3. Centrifuge the filtrate immediately at 4 °C, 3500g for 5 min. 4. Remove the supernatant, resuspend the pellet immediately with 1 mL pre-cooled Honda buffer, and transfer to a new ice-cold 2 mL tube. (Note: 2 mM DTT, 5 × tRNA solution, 10 μL RNase inhibitor, and 1 × proteinase inhibitor cocktail are added into 1 mL Honda buffer just before use). 5. Centrifuge immediately at 4 °C, 8000g for 1 min. 6. Remove the supernatant thoroughly and weigh the pellet. 7. Resuspend the pellet with 1 volume (w/v) of nuclei resuspension buffer. (e.g., resuspend 100 mg nuclei pellet with 100 μL nuclei resuspension buffer). Stir to mix the pellet and pipet up and down several times with an end-cut 200μL tip. (Note: 2 mM DTT, 5 × tRNA solution, 2 × proteinase inhibitor cocktail, and 1 μL RNase inhibitor in every 50 μL buffer are added into nuclei resuspension buffer just before use). 8. After homogenization, add 2 volumes of wash buffer into the above mixture, and pipette gently 20 times with an end-cut 1 mL tip to ensure samples are thoroughly mixed. (Note: Take ~ 1/10 resuspended nuclei at this step and snap frozen in liquid nitrogen if the nuclear fraction of RNA needs to be assessed by qPCR analysis.) 9. Store the mixture on ice for 1 min. 10. Centrifuge immediately at 4 °C, 8000g for 1 min.
Remove the supernatant thoroughly. (Note: Keep the supernatant as wash I and snap frozen in liquid nitrogen if this RNA fraction needs to be assessed by qPCR analysis.) 11. Resuspend the pellet again with 1 volume of nuclei resuspension buffer. Stir to mix the pellet and pipette gently several times with an end-cut 200 μL tip. 12. After homogenization, immediately add 1 volume of wash buffer to the mixture and pipette up and down 20 times with an end-cut 1 mL tip to ensure samples are thoroughly mixed. 13. Store the mixture on ice for 1 min. 14. Centrifuge immediately at 4 °C, 12,000g for 2 min.
Remove the supernatant thoroughly.

Next-generation sequencing and bioinformatic analysis
Paired-end sequencing was performed on the bar-coded library using an Nova Seq 6000. The adapters, Ns and lowquality bases were removed from raw data using the Trimmomatic [28] package (version 0.39). Then, the reads shorter than 36 bp were also removed. And the clean reads were mapped to the TAIR10 genome using the HISAT2 (version 2.2.0) [29] with default parameters. The reads with more than one reported alignment were excluded. The mapped reads were sorted and indexed by SAMtools [30]  were used for the calculation of 5' SS or 3' SS ratio. And the calculation method was same as the previous study [18]. The structure of the longest transcript was used as representative gene structure. And the average SS ratio of all the introns represents the gene's SS ratio. For metagene profiling, TPM was calculated with a 5-bp sliding window, and profiles were visualized using plotProfile in deepTools [32]. Although a strand-specific RNA-seq library was generated with commercial kit from NEB (E7760), it is worth noting that for the Arabidopsis genome, antisense transcription is quite rare. Hence for make metagene profiling, any reads that belongs to antisense transcription for genes across the genome were kept. The stand-specific profile (with antisense transcription filtered out) looks similar to non-strand-specific profile (data not shown). The detailed pipeline of bioinformatic analysis can be find in Additional file 1.